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ABSTRACT 


The suggested bat-origin for Middle East respiratory syndrome coronavirus (MERS-CoV) has 
revitalized the studies on other bat-derived coronaviruses for the interspecies transmission 
potential. Bat coronavirus (BatCoV) HKU9 is an important betacoronavirus (betaCoV) that is 
phylogenetically affiliated with the same genus as MERS-CoV. The bat-surveillance data 
indicated that BatCoV HKU9 has been widely spreading and circulating in bats. This highlights 
the necessity of characterizing the virus for its potential of crossing species barriers. The receptor 
binding domain (RBD) of the coronavirus spike (S) recognizes host receptors to mediate virus 
entry and is therefore a key factor determining the viral tropism and transmission capacity. In 
this study, the putative S RBD of BatCoV HKU9 (HKU9-RBD), which is homologous to other 
betaCoV RBDs that have been structurally and functionally defined, was characterized via a 
series of biophysical and crystallographic methods. By using surface plasmon resonance, we 
demonstrated that HKU9-RBD binds to neither the SARS-CoV receptor ACE2 nor the MERS- 
CoV receptor CD26. We further solved the atomic structure of HKU9-RBD, which is expectedly 
compose of a core and an external subdomain. The core subdomain fold resembles those of other 
betaCoV RBDs; whereas the external subdomain is structurally unique with a single helix, 
explaining the inability of HKU9-RBD to react with either ACE2 or CD26. Via comparison of 
the available RBD structures, we further proposed a homologous inter-subdomain binding mode 
in betaCoV RBDs that anchors the external subdomain to the core subdomain. The revealed 


RBD features would shed light on the betaCoV evolution route. 
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Introduction 


Coronaviruses are large, enveloped and positive-stranded RNA viruses which can infect birds, 
animals and humans” *. Taxonomically, these viruses are affiliated to the Coronaviridae family 
within the order of Nidovirales’ *. Ever since the 1930s when the first coronavirus of infectious 
bronchitis virus was isolated in chicken*, coronaviruses have expanded thus-far into four genera, 
Alpha, Beta, Gamma® and Deltacoronavirus” °, respectively. Of these, betacoronaviruses 
(betaCoVs) have drawn worldwide attention because of their pathogenic capacity and 
transmission potential to cause a global pandemic of human infections” ® and of their wide- 


ee & 6, 9-11 
spread and existence of enormous species in bats” 


. In 2002-2003, one representative 
betaCoV, the severe acute respiratory syndrome coronavirus (SARS-CoV) firstly emerged in 
China’*”” and then rapidly spread to other countries, leading to >8000 infection cases and >800 
deaths’. In 2012, another betaCoV, named the Middle East respiratory syndrome coronavirus 
(MERS-CoV)"°, was identified first in Saudi Arabia!” 7%. Despite the global efforts trying to 
control its transmission, MERS-CoV still spreads to affect multi-countries in the Middle East, 
Europe, North America and Asia, causing 1,800 confirmed infections and at least 640 deaths as 
of June 23'"2016 (based on the latest statistic data released by the World Health Organization’). 
Meanwhile, a human-infective betaCoV of HKU1 was isolated from a patient with respiratory 


disease in Hong Kong”. These unexpected outbreaks of betaCoV infection have posed a severe 


threat to the global public health and lead to enormous socioeconomic disruptions. 


Phylogenetically, betaCoVs can be further categorized into four (A, B, C and D) evolutionary 
lineages/subgroups” *. SARS-CoV is a typical lineage B member, while MERS-CoV is grouped 
in the C-lineage*’. Despite belonging to different subgroups, these two betaCoVs likely share 


similar interspecies transmission routes by “jumping” from its natural host/(s) to an intermediate 
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adaptive animal/(s) and finally to humans’’. Current evidence clearly showed that SARS-CoV 


* 22,23 and possibly adapted in civets or raccoon dogs” before it infected 


originated from bats 
humans. Given the close phylogenetic relationship between MERS-CoV and a variety of bat- 
derived coronaviruses (BatCoV) (e.g. HKU4, HKUS’” ” and those recently identified in the 
Middle East, Africa, Europe and Asia’*”’), it is widely accepted that the current MERS-epidemic 
represents another bat-to-human transmission event related with a betaCoV, though its 


. : : ree ‘35938 
intermediate host is shown, this time, to be dromedaries”” 


. Notably, two recent studies 
reported that BatCoV HKU4 could recognize human CD26, the MERS-CoV receptor’’, as a 
functional entry receptor®” *°, indicating its potential adaptation for human infection. These 
continuously occurring yet unpredictable events of betaCoVs repeatedly crossing species barriers 


highlight the pressing necessity of studies on other members of the genus for the characters 


‘ ‘ a ee OT 
relevant to the interspecies transmission*’. 


The coronavirus spike (S) protein, which locates on the envelope surface of the virion, functions 
to mediate receptor recognition and membrane fusion’ and is therefore a key factor determining 


71.37 Tn most cases, coronaviral S will be further cleaved 


the virus tropism for a specific species 
into S1 and S2 subunits, and the capacity of receptor-binding is allocated to the S1 subunit’. The 
receptor binding domain (RBD) of betaCoV that directly engages the receptor is commonly 
located in the C-terminal half of S1 (C terminal domain, CTD), such as in SARS-CoV*’, MERS- 
Cov’ ” and BatCoV HKU4*, though in rare cases such as with mouse hepatitis virus (MHV)”, 
the RBD region was identified in the S1 N-terminal domain (NTD). We previously characterized 
structurally the MERS-CoV RBD (MERS-RBD) as a relatively independent entity composed of 


a core and an external subdomain*’. The latter subdomain, which is topologically an insertion 
pologically 


between two scaffold strands of the core subdomain, presents a flat 4-stranded B-sheet surface to 
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contact the CD26 receptor’. Similar topological arrangement of the core and external 
subdomains into a structural unit for receptor engagement was also observed in SARS-CoV RBD 
(SARS-RBD)**. Nevertheless, SARS-RBD exhibits a unique loop-dominated external fold to 
recognize human angiotensin converting enzyme 2 (ACE2)” as a receptor. These observations 
indicate that the homologous RBD regions of betaCoVs represent a key determinant in receptor 


adaptation and cross-species transmission””. 


BatCoV HKU9 is a representative betaCoV of the D-lineage’’. The virus was firstly identified in 
bats in 2007 by next generation sequencing (NGS)’’. Though the isolation of live viruses has 
been unsuccessful by far, its genomes are widespread in different bat species***°. As people are 
worrying its interspecies transmission potential, the features of its S protein, especially of the 
homologous RBD region (HKU9-RBD), remain unavailable. This would be an indispensable 
step towards understanding the pathogenesis of BatCoV HKU9. In addition, the atomic structure 
of HKU9-RBD would provide requisite information in understanding the evolution of betaCoVs. 
It is notable that MERS-RBD and SARS-RBD share a conserved core structure but differ in the 
external fold to engage different receptors~” ** °°. Sequence features of betaCoV RBDs clearly 
indicate that this scheme of subdomain-arrangement might be expanded to the whole 
betacoronavirus genus, regardless the species. This notion was supported by our recent study on 


BatCoV HKU4 RBD (HKU4-RBD) which exhibits a quite resembled structure to MERS-RBD”. 


In this study, we reported the structural and functional characterization on HKU9-RBD. The 
solved structure expectedly contains a core subdomain homologous to those observed in other 
betaCoV RBD structures and an external subdomain that is mainly a-helical. This unique 
structural feature explains its inability to react with either human CD26 or ACE2, which is well 


observed in our surface plasmon resonance (SPR) assay. Via comparison of available RBD 
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structures, we further showed that the detailed interactions, anchoring the external subdomain to 
the core subdomain, share similar patterns in betaCoV RBDs. We believe the observed 
core/external interacting mode represents another structural feature in the S that is reserved 
during the evolution of betaCoVs, in addition to the conservation in the fold for the core 
subdomain. Our study therefore further supports the notion that betaCoV S originates from the 
same ancestor and divergently evolves mainly in the RBD external region to engage variant 


receptors, thereby preparing for potential interspecies transmission. 


Materials and Methods 


Plasmid construction 


The plasmids used for protein expression were individually constructed by insertion of the 
coding sequences for HKU9-RBD (S residues S$355-N521, GenBank accession number: 
EF065513), MERS-RBD (S residues E367-Y606, accession number: JX869050), SARS-RBD (S 
residues R306-F527, accession number: NC_004718), human CD26 (residues S39-P766, 
accession number: NP 001926) and human ACE2 (residues $19-D615, accession number: 
BAJ21180) into the EcoRI and Xhol restriction sites of a previously modified pFastBacl vector’ 
which was engineered to include an N-terminal gp67 signal peptide coding sequence. For each 
protein, an engineered C-terminal hexa-his tag was utilized to facilitate protein purification. To 
prepare mouse IgG Fc fragment (mFc) fused proteins, the coding sequences of MERS-RBD, 


SARS-RBD and HKU9-RBD were fused with mFc sequence and then constructed into the 


pCAGGS vector respectively”. 
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Protein expression and purification 


The proteins used for crystallization and SPR analysis were prepared with the Bac-to-Bac 
baculovirus expression system (Invitrogen) according to the manufacturer’s instructions*®. In 
brief, the verified pFastBacl vector was transformed into the DH10Bac competent cells to 
generate the recombinant bacmid. The bacmid was then extracted and transfected into Sf9 cells 
to prepare the baculovirus stocks. Sf9 cells were further used to amplify the baculoviruses, while 


HighS cells were used to express the protein. 


The cell culture of High5 was collected 48 h post infection. Totally 4 L of cell culture of each 
protein were collected and centrifuged at 6,500 rpm for 1.5 h to remove cell debris. After filtered 
with 0.22 um membrane, the supernatant was passed through two 5 ml HisTrap HP columns (GE 
Healthcare) to capture the individual protein of interest. For MERS-RBD, SARS-RBD, human 
CD26 and human ACE2, the bound proteins were detached from HisTrap with 20 mM, 50 
mMand 300 mM imidazole in 20 mM Tris-HCl and 150 mM NaCl buffer (pH 8.0). After SDS- 
PAGE determination, fractions detached with 300 mM imidazole were pooled and further 
purified by a Superdex” 200 column (GE Healthcare). For HKU9-RBD, the bound proteins were 
detached from HisTrap with 20 mM, 50 mM and 300 mM imidazole in 20 mM HEPES and 150 
mM NaCl buffer (pH 7.0). Fractions detached with 50 mM and 300 mM imidazole were pooled 
respectively and dialyzed overnight against 5 L 20 mM HEPES and 150 mM NaCl buffer (pH 
7.0) to remove imidazole. The dialysates were concentrated and further purified by a Superdex” 


200 column (GE Healthcare). Each protein was stored in the buffer that is used for purification. 
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To prepare mFc-fused proteins by mammalian cell expression system, the recombinant pCAGGS 
plasmids were confirmed with Sanger sequencing and then prepared with EndoFree Maxi 
Plasmid Kit (TIANGEN, Beijing). Each recombinant plasmid was transfected into 293T cells 
with 50 ug plasmid DNA per T75 plate using Polyethylimine (PEI, Polysciences Inc.). After 5 h 
incubation, the transfected cells were washed with PBS twice and then replaced with DMEM 
without serum. The cells were maintained for three days and the supernatant was harvested and 
replaced with fresh DMEM medium and then maintained for another four days. The harvested 
supernatants were pooled and concentrated and then mixed with two volumes 20 mM Trisodium 
phosphate (pH 7.0). The mixture was passed through a 5 ml HiTrap™ Protein A HP prepacked 
column (GE Healthcare) to capture the individual protein of interest. After remove of impure 
proteins with 20 mM Trisodium phosphate (pH 7.0), the bound protein was detached from the 
column with 100 mM glycine (pH 3.0). Each fraction was neutralized with 1 M Tris-HCl (pH 
9.0). After SDS-PAGE determination, the detached fractions with interest protein were pooled 
and concentrated. The buffer of each protein was then changed to PBS (pH 7.0) for further 


experiments. 


SPR assay 


The BiAcore® experiments were carried out at 25 °C using a BIAcore® 3000 or BIAcore® T100 
machines with CM5 chips (GE Healthcare). For all the measurements, an HBS-EP buffer 
consisting of 10 mM HEPES, pH7.4, 150 mM NaCl and 0.005% v/v Tween® 20 was used, and 
all proteins were exchanged into this buffer in advance. Firstly, the HKU9-RBD, MERS-RBD 


and SARS-RBD proteins expressed by insect cell were used for SPR assay using a BIAcore® 
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3000 machine. BSA (negative control), HKU9-RBD, MERS-RBD and SARS-RBD proteins 
were immobilized on the chip at about 1000 response units (RU), according to manufacturer’s 
amine-coupling chemistry protocol (GE Healthcare). Gradient concentrations of human CD26 (0, 
19.5 — 5000 nM) or human ACE2 (0, 39 - 625 nM) were then flowed over the chip surface. After 
each cycle, the sensor surface was regenerated via a short treatment with 10 mM NaOH. The 
equilibrium dissociation constants (binding affinity, Kp values) were analyzed using BIA 
evaluation” (BIAcore™ software). In order to exclude the possibility that HKU9-RBD could be 
non-functional because of immobilization or missing some important post-translational 
modifications on the protein, we purified the mFc-fused HKU9-RBD proteins in mammalian 
cells and assembled the binding abilities to CD26 or ACE2 proteins using a captured SPR 
method by a BIAcore® T100 system. The CMS5 chip was immobilized with anti-mouse antibody 
for flow cell (FC) 1 and 2. The mFc-fused RBD proteins were then injected and captured on the 
FC 2, while FC 1 was used as negative control. Human CD26 or human ACE2 proteins were 
theninjected and the binding responses were measured. The immobilized anti-mouse antibody 


was regenerated with 10 mM Glycine, pH 1.7 (GE Healthcare).. 


Crystallization 


The crystallization trials were performed with 1 uL protein mixing with 1uL reservoir solution 
and then equilibrating against 100 uL reservoir solution at 4°C by the vapor-diffusion sitting- 
drop method. The initial crystallization was screened using the commercially-available kits. 
Diffractive crystals of the HKU9-RBD protein were finally obtained in a condition consisting of 


0.1 M sodium citrate tribasic dihydrate pH 7.0 and 12% PEG 20,000 with a protein concentration 
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of 2.2 mg/mL. Derivative crystals were obtained by socking the crystals in the reservoir solution 


containing 1mM KAuBr4*2H20O for 48 h at 4 °C. 


Data collection, integration and structure determination 


For data collection, all crystals were flash-cooled in liquid nitrogen after a brief soaking in 
reservoir solution with the addition of 20% v/v glycerol. The diffraction data for the native 
(wavelength: 1.03906 A) and Au derivative crystals (wavelength: 1.03906 A) of HKU9-RBD 
were collected at Shanghai Synchrotron Radiation Facility (SSRF) BL17U. All data were 
processed with HKL2000*’.The ice-rings that form in the crystal flash cooling process were 


excluded in data processing and the final overall completeness for the data set is 97.1%. 


The structure of HKU9-RBD was determined by SAD method. After location of Au sites by 
SHELXD”’ with the Au-SAD data, the identified positions were then refined and the phases 
were calculated with SAD experimental phasing module of PHASER*’. The real space 
constraints were further applied to the electron density map in DM’’. The initial modelwas built 
with Autobuild in the PHENIX package’. Additional missing residues were added manually in 
COOT*’. The final model was refined with phenix. refinein PHENIX®’ with energy 
minimization, isotropic ADP refinement, and bulk solvent modeling. The stereochemical 
qualities of the final model were assessed with MolProbity”. The Ramachandran plot 
distribution for the residues in the HKU9-RBD structure were 94.64, 5.36 and 0% for favored, 


allowed and outliers regions, respectively. Data collection and refinement statistics are 
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summarized in Table 1. All structural Figures were generated using PyMol 


(http://www.pymol.org). 


Results 
HKU9-RBD does not bind either the SARS-CoV or the MERS-CoV receptor 


We first characterized the sequence of BatCoV HKU9 S by using a series of bioinformatic 
methods. This 1274-residue protein exhibits typical features of coronavirus S proteins (e.g. the 
presence of characteristic heptad repeats 1 and 2 in the S2 subunit), though the S1/S2 cleavage 
site potentially processed by furin-like proteases was not detected (Fig. 1A). Along the full- 
length protein, the amino acid sequence identity between BatCoV HKU9 S and other betaCoVSs 
is rather limited (e.g. 27.9% to MERS-CoV S, 28.0% to HKU4 S and 30.4% to SARS-CoV S). 
Nevertheless, we were able to identify the RBD region based on the characteristic cysteine 
residues of the core subdomain (Fig. 1B), which were shown, in the thus-far available RBD 


35, 38, 39, 56 
structures’ ©" ~” 


, to form three conserved disulphide bonds stabilizing the core fold. The 
subsequent HKU9-RBD was allocated to the S region spanning residues 355-521 (Fig. 1A). In 
comparison to other RBD sequences, HKU9-RBD exhibits a comparable length in the core 


subdomain (Fig. 1B) but is dramatically shortened in the external region (Fig. 1C). 


To test if HKU9-RBD could react with either the SARS-CoV receptor ACE2” or the MERS- 
CoV receptor CD26**, the RBD and the receptor-ectodomain proteins were individually prepared 
in insect cells and purified to homogeneity. The ligand/receptor interaction was then 
characterized via SPR Biacore® by flowing through ACE2 or CD26 over the immobilized RBD 


proteins. As expected, potent interactions were observed for both the SARS-RBD/ACE2 (Kp 
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value: 0.265 uM) (Fig. 2A) and the MERS-RBD/CD26 (Kp value: 52.8 nM) (Fig. 2B) binding 
pairs. The revealed kinetics were very similar to those reported in the previous studies*” *”, 
validating the integrity of our testing system. Under the same condition, however, neither ACE2 
(Fig. 2C) nor CD26 (Fig. 2D) interacted with HKU9-RBD. In order to exclude the possibility 
that HKU9-RBD could be non-functional because of immobilization or missing some important 
post-translational modifications on the protein, we purified the mFc-fused HKU9-RBD proteins 
in mammalian (293T) cells and assembled the binding abilities to CD26 or ACE2 proteins using 
a captured SPR method. In the same way, there was no detectable binding of mFc-fused HKU9- 
RBD to ACE2 or CD26 (Fig. 1G), while the mFc-fused SARS-RBD protein bound to ACE2 
(Fig. IE) and the mFc-fused MERS-RBD bound to CD26 (Fig. 1F) well. BatCoV HKU9, 


therefore, could utilize neither the SARS-CoV receptor nor the MERS-CoV receptor for cell 


entry. Rather, it must utilize a unique cellular receptor for entry. 


Crystal structure of HKU9-RBD 


We further set out to investigate the structural features of HKU9-RBD via crystallography. The 
protein was successfully crystallized, and a dataset of 2.1 angstrom (A) was collected (Table 1) 
and the structure was solved by using single-wavelength anomalous diffraction (SAD) method. 
The solved structure, with an Rwork = 0.1700 and an Réee = 0.2006, contains a single molecule in 
the crystallographic asymmetric unit. Clear electron densities were traced for 176 consecutive 
HKU9-RBD residues, extending from S355 to A520. These amino acids fold into a compact 
structure, which can be further divided into two subdomains as schemed in the other RBD 


35, 38, 39 


structures . The core subdomain comprises 8 -strands and 6 helices (a or 319). Five long 
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strands (Bcl-Bc5) were arranged in an anti-parallel manner, forming the scaffold center of the 
core (core-center). This core-center sheet was further wrapped by the surface helices and loops. 
It is notable that the 6 helices (H1-H6) were sporadically distributed on the two sheet faces, 
thereby leading to an overall globular fold for the core subdomain. On one lateral side of the 
core-center sheet, the external subdomain covers the core like a hat; while on the distal opposite 
side, three small strands (Bp1-Bp3) constitute a parallel peripheral sheet (core-peripheral), 
clinching the N- and C-termini of HKU9-RBD in the close proximity. As expected, the 
characteristic cysteine residues (Fig. 1B) form three disulphide bonds in the core subdomain, 
further stabilizing the core structure from the interior. Of these, two (C357/C381 and 
C411/C517) locate in core-peripheral, contributing to the RBD-termini orientation; one 
(C399/C452) resides in core-center, linking strands Bc2 and Bc4 (Fig. 3). Overall, the residue 
boundaries of the core subdomain observed in the structure are quite consistent with those 


deduced from the sequence alignment result (Fig. 1B). 


The external subdomain of HKU9-RBD consists of 42 residues from L458 to V499 (Fig. 1C). 
These amino acids extend out of strand Bc4 of the core-center sheet, first orient as a loop along 
the core subdomain like a clamp, then fold back to form a solvent exposed a-helix (H1’) and 
finally proceed into the core strand Bc5 (Fig. 3). This observed structure differs dramatically 
from those of SARS-RBD and MERS-RBD, which are shown to be devoid of any helical 
components in the external region*® *’. The unique external fold of HKU9-RBD could well 


explain its inability to bind either ACE2 or CD26. 


Structural conservation of the RBD core subdomain in betaCoVs 
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Previously, three betaCoV RBD structures have been reported, including one lineage B structure 
(SARS-RBD*’) and two lineage C_ structures (MERS-RBD*’ and HKU4-RBD*’). These 
structures indicated an interspecies conservation in the core fold among betaCoVs*’. BatCoV 
HKU9 is a representative member of the betaCoV D-lineage’’. We therefore compared the 
currently available RBD structures with the HKU9-RBD structure solved in this study. As 
expected, a significant similarity was observed in the core subdomain (Figs. 4A-D). 
Superimposition of the core-structures revealed the root-mean-square deviation (rmsd) values 
ranging from 0.66 to 2.82 A (Table 2), demonstrating the quite resembled core fold (though low 
sequence identity) among the four RBDs. The most conserved part was seen in the core-center 
sheet. This 5-stranded scaffold element as well as the single inter-strand disulphide bond were 
invariably reserved in all the structures. In the core-peripheral region, however, small variance in 
the strand composition was noted. In HKU9-RBD, it contains three short B-strands, arranging 
into a small parallel B-sheet. Both SARS-RBD and MERS-RBD retained two of these strands, 
whereas HKU4-RBD is devoid of any detectable strand elements in this region. Despite the 
observed difference in strand formula, the core-peripherals of the RBDs exhibit similar 
orientation and present the same scheme of arranging the domain N-terminus into close 
proximity to its C-terminus. Extra common features in core-peripheral lie in the two disulphide 
bonds in the region, which were structurally and topologically conserved in the four structures 


(Figs. 4A-D). 


In contrast to the core-conservation, the external subdomains of the four RBDs were divergent in 
structures. HKU9-RBD presents a single H1’ helix in the external region, whereas SARS-RBD is 
loop-dominated, but contains two extra small B-strands. The external subdomains of MERS- 


RBD and HKU4-RBD, however, resemble each other and are predominantly a rigid B-sheet 
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composed of four B-strands. Despite the structural irrelevance, the external subdomains are 
clearly topological equivalents in these structures, being present as an insertion between two 


core-center strands (Figs. 4A-D). 


Homologous interaction mode anchoring the external subdomain to the core subdomain 


By superimposition of the available RBD structures, we unexpectedly identified two major 
elements in the external subdomain that could be well-aligned (Fig. 5A). The first element 
(element 1) spans about 7 residues (Y464-F470 in HKU9-RBD, Y438-R444 in SARS-RBD, 
Y497-C503 in MERS-RBD and Y501-C507 in HKU4-RBD) (Figs. 1C&5C) and proceeds along 
the core subdomain surface to be lodged between the H2 and H6 helices (based on the secondary 
element definition of HKU9-RBD) (Fig. 5A). The second element (element 2) contains 8 amino 
acids (P471-Q478 in HKU9-RBD, K447-D454 in SARS-RBD, L517-S524 in MERS-RBD and 
Y522-S529 in HKU4-RBD) (Figs. 1C&5C), extending as a curved loop covering helix H6 of the 
core subdomain (Fig. 5A). It is interesting that these two elements were “saddled” upon the core- 
helices, anchoring the external subdomain to the core subdomain (Fig. 5A). We therefore further 


explored the amino acid interaction details at this core/external interface. 


Each element residue was scrutinized for both the side-chain orientation and the inter-subdomain 
interactions. To facilitate the analyses and comparison, the two elements were assigned a 
position marker for each amino acid (a-g for element | and a-h for element 2) (Figs. 5B&5C). In 
element 1, the a-residue is invariably a tyrosine in the four RBDs. This amino acid oriented its 
bulky side-chain towards the core subdomain, providing strong hydrophobic contacts. An extra 


side-chain H-bond was also observed at this position in HKU9-RBD. The b-residue extended 
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away from the core surface and exhibited little conservation. The residue, however, invariably 
contributed to the subdomain anchoring by providing a main-chain H-bond. Following b, the 
amino acids were preferably facing towards the core at position c and spreading parallel to the 
core surface at position d. Multiple van der Waals (vdw) contacts and conserved main-chain H- 
bonds were observed at these two positions, respectively. Small discrepancy was seen in SARS- 
RBD, which orients its c-residue outwards for the bulky solvent region. The remnant three 
element 1 amino acids at positions e, f and g were distant from the core subdomain and 
completely solvent exposed, therefore contributing little to the core/external interactions (Figs. 


5B&SC). 


In element 2, both the a- and d-residue were oriented parallel to the surface of the core 
subdomain. The configuration allows the amino acids to provide apolar vdw contacts to 
strengthen the core/external subdomain-binding. A certain extent of amino acid conservation was 
observed at position d where a proline is favored to facilitate the turning of the loop. Following 
these two positions, the b- and e-residue inserted their side-chains into two surface pockets of the 
core subdomain. At position b, the residue is conservatively of hydrophobic and middle-sized 
side-chain (Val/Ile/Leu). It is accommodated in a shallow apolar pocket, creating strong stacking 
forces bonding the core and external subdomains. In addition, the residue also contributes a 
main-chain H-bond to the subdomain binding. For the e-residue, its accommodating pocket is 
deep and of large size, therefore allowing for amino acid variance at the position (Gly in HKU9- 
and HKU4-RBD, Phe in SARS-RBD and Asn in MERS-RBD). In the four RBDs, this e-residue 
invariably H-bonds with the core subdomain residue via the main-chain atom, but may optionally 
provide the side-chain H-bond interactions (e.g. in MERS-RBD) or the multi-vdw contacts (e.g. 


in SARS-RBD). Extra core/external interactions in this region were further observed at position 
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g, where the residue is oriented either in parallel to or towards the core subdomain and thereby 
contributes to the binding via hydrophobic and side-chain H-bond interactions. It is also of 
interest that these important interface residues of element 2 are regularly interspersed by amino 
acids at positions c, f and h, which are solvent exposed and rarely interact with the core 


subdomain (Figs. 5B&5C). 


In summary, the four coronaviral RBD structures presented homologous amino acid interaction 
patterns for the inter-subdomain binding. The binding relies mainly on two elements in the 
external subdomain, which are oriented similarly in these structures (Fig. 5A). Despite the less 
conservation in the element sequences, the side-chain orientation and the interaction modes 
(hydrophobic, vdw or H-bond contacts) at each position are, in most cases, similar or 


homologous (Fig. 5C). 


Discussion 


Bats have been implicated harboring the largest natural genetic pools for new coronarivuses or 
coronaviral genes. A majority of the betaCoVs could be traced, for the origin, back to bats”’, e.g. 
a recent study isolated, in Chinese horseshoe bats, a live SARS-like coronavirus that can utilize 
the SARS-CoV receptor of ACE2 for cell entry”’, thereby providing the strongest evidence for 
the bat-origin of this pandemic human pathogen. In addition, two studies reported the 
identification of gene fragments in bats that are almost identical to those of MERS-CoV”® ”’, 
indicating that MERS-CoV likely also originates from bats. Noted the recent reports showing the 


adaptation of batCoV HKU4 for binding to human cells via recognizing CD26*”, it is an urgent 


issue to prepare for the unforseeable events of potential interspecies transmission by other bat- 


ACS Paragon Plus Environment 
18 


Page 19 of 39 


OONOOARWD — 


Biochemistry 


derived betaCoVs. BatCoV HKU9 is an important lineage D betaCoV’’ and has been 
demonstrated to be widespread and be circulating in different bat species****. Noted the 
determinative role of the coronaviral S RBD in the process of crossing species barriers (as has 


been structurally illustrated in other coronaviruses*” ** *” 


), we characterized the homologous 
RBD protein of batCoV HKU9 S for the structural and functional features. The solved structure 
revealed a core subdomain that resembles those observed in SARS-, MERS- and HKU4-RBD 
but a unique external subdomain that is composed of a single helix. Since the RBD external 
subdomain contains the key motifs (denoted receptor binding motif, RBM), interfacing with the 


receptor” nex ge 


, the unique external fold of HKU9-RBD therefore well accords with our 
functional data showing its inability to react with either ACE2 or CD26. It remains to be 
investigated which host molecule could be recognized by HKU9-RBD as a functional cell entry 
receptor. Nevertheless, taking into account the single helical component in the external 


subdomain of HKU9-RBD, the RBM is expected to locate on the solvent-access side of the 


helix, which might facilitate future attempts for receptor identification. 


It should be noted that coronavirus RBDs are not necessarily locating in the C-terminal half of 
the S1 subunit. Previous structural and mutagenesis data showed that the RBD of MHV S locates 
in the S1 N-terminal half (NTD)*” **. Nevertheless, the current data seem to favor a notion that 
CTD prioritizes over NTD to function as the receptor binding entity, as the majority of the 
coronaviruses (e.g. SARS-CoV*’, MERS-Cov*” , batCoV HKU4*° , human coronavirus NL63° 7 
and transmissible gastroenteritis virus” etc) harbor a CTD as RBD. It is interesting that the 
current available betaCoV CTD/RBD structures*” ** *? all clinch the N- and C-terminus on the 
same side opposite to where the external subdomain locates. This arrangement mode would lead 


the S1 N-terminal half to being sterically underneath the C-terminal half, thereby projecting the 
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CTD distant from the viral envelope for a trans-interaction with the receptors. Our structural 
study demonstrated that HKU9-RBD reserved the same character and therefore stands a better 


chance of being the authentic receptor binding entity. 


In comparison to SARS-, MERS-, and HKU4-RBD whose structures are available” °° a 
HKU9-RBD differs in the external fold but reserves a resembled core subdomain structure. The 
most conserved part lies in the core-center sheet which is composed of five anti-parallel strands 
and functions as the scaffold of the core subdomain. The sheet is sterically and structurally 
conserved in all the RBD structures. Additional conserved elements include the core-center 
helices and core-peripheral. Nevertheless, these elements could vary in the secondary-element 
compositions. E.g. A recent study on the structure of HKU4-RBD*® showed that its N-terminal- 
most part does not fold into a characteristic helix as observed in the structures of SARS-** and 
MERS-RBD”. For core-peripheral, the amount of strands was found to be varying from zero (as 
in HKU4-RBD) to three (as in HKU9-RBD) (Fig. 4). This has added dramatic complexities to 
the nomenclature of the RBD secondary elements. The situation would be even worse when 
taking into account of the external subdomains which could vary significantly in structure. We 
therefore suggest to designate the core-center strands and helices as Bcs (Bcl-Bc5) and Hs (H1, 
H2...etc), respectively, and to refer to the core-peripheral strands as Bps (Bp1, Bp2...etc) and the 
external elements as H’s (H1’, H2’...etc) or B’s (B1’, B2’...etc). This terminological strategy 
should be able to facilitate the comparison of homologous RBD structures and to reflect the fact 
that the external subdomain is topologically an insertion between two equivalent core-center 


strands. 


The long evolutionary history, high mutation rates and many genetic artifices of RNA viruses 


often lead to conundrums in the virus origin study. It would be even more difficult to track the 
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evolutionary traces in the viral surface proteins which are normally under great evolutionary 
pressure. The evolutionary records, however, are more likely to be conserved in the tertiary 
structures than in the amino acid sequences. In betaCoVs, the inter-lineage sequence identity in 
the S RBD is rather limited. Nevertheless, we observed several conserved features in the 
betaCoV RBD structures. These include: 1) a conserved core-center as the scaffold of the core 
subdomain; 2) a resembled core-peripheral where the RBD termini are clinched in proximity; 3) 
a similar topological arrangement of the external subdomain as an insertion between two core- 
strands; and 4) a homologous inter-subdomain binding mode anchoring the external subdomain 
to the core subdomain. The features indicate a common ancestor S protein which divergently 
evolves into different species. During evolution, the core subdomain is structurally reserved, 
whereas the external subdomain folds into variant structures to engage different receptors. It is 
also noteworthy that the aforementioned core features have been structurally validated in 
betaCoV lineages B (SARS-RBD), C (MERS- and HKU4-RBD) and D (HKU9-RBD) but not 
yet in lineage A. Structural studies on the equivalent S RBDs of the lineage A members should 


be conducted in the future. 


FIGURES 


Figure 1. Sequence features of HKU9-RBD. (A) A schematic representation of the BatCoV 
HKU9 S. The indicated domain elements were defined based either on the pairwise sequence 
alignment results or on the bioinformatics predictions. The signal peptide (SP), transmembrane 
domain (TM) and heptad repeats 1 and 2 (HR1 and HR2) were predicted with SignalP 4.0 server, 


TMHMM server and Learncoil-VMF program, respectively; while the N-terminal domain 


ACS Paragon Plus Environment 
21 


OONOOARWD — 


Biochemistry 


(NTD) and RBD were deduced by aligning with the N-terminal galectin-like domain of murine 
hepatitis virus S and with MERS-RBD, respectively. The S1/S2 site potentially cleaved by furin- 
like proteases could not be ascertained and is therefore labeled with a question mark. (B, C) 
Structure-based alignment of the HKU9-, SARS-, MERS- and HKU4-RBD sequences. The 
arrows and spiral lines indicate strands and helices, respectively. These secondary structure 
elements were labeled as illustrated in Fig. 3. The conserved cysteine residues that form three 
disulphide bonds in the structures are marked with Arabic numbers 1-3. The core subdomain is 
conserved among the four RBD structures; but the external subdomain is structurally irrelevant. 
We therefore presented the sequences separately. The two elements that anchor the external 
subdomain to the core subdomain were highlighted with the black boxes. (B) Core subdomain 


sequence. (C) External subdomain sequence. 


Figure 2. Characterization of HKU9-RBD by SPR assays. The indicated RBD proteins 
expressed by insect cells were immobilized on CMS chips and tested for the binding with 
gradient concentrations of human ACE2 or CD26 using a BIAcore® 3000 machine. The recorded 
kinetic profiles are shown. (A) Human ACE2 to SARS-RBD. (B) Human CD26 to MERS-RBD. 
(C) Human ACE2 to HKU9-RBD. (D) Human CD26 to HKU9-RBD. Clearly shown is that 
HKU9-RBD does not bind either ACE2 or CD26, in the context of SARS-RBD and MERS-RBD 
bind their respective receptors. Then we purified the mFc-fused HKU9-RBD proteins in 
mammalian (293T) cells and assembled the binding abilities to CD26 or ACE2 proteins using a 
captured SPR method by a BIAcore® T100 system. The anti-mouse antibodies were immobilized 
on CMS chips. The mFc fused RBD proteins were then captured (3 ug/mL for 60 seconds) by the 


antibodies, and tested for the binding with human ACE2 or CD26. (E) The mFc fused SARS- 
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RBD (SARS-RBD-mFc) did not bind to CD26, while bound to ACE2 well. (F) The mFc fused 
MERS-RBD (MERS-RBD-m Fc) did not bind to ACE2, while bound to CD26 well. (G) The mFc 


fused HKU9-RBD (HKU9-RBD-mFc) does not bind either ACE2 or CD26. 


Figure 3. Crystal structure of HKU9-RBD. The core and external subdomains are colored 
magenta and green, respectively. The core subdomain is further divided into a center region 
(core-center) and a peripheral region (core-peripheral), which are encircled for highlight. The 
core-center strands and helices are labeled Bcl-Bc5 and H1-H6, respectively; while the core- 
peripheral strands are marked Bp1-Bp3. The disulphide bonds and the RBD termini are labeled. 
The core subdomain is further presented in a surface representation in the right panel to highlight 


the top positioning of the external subdomain like a hat. 


Figure 4. Structural and topological comparison of thus-far available betaCoV RBD 
structures. Four structures, including those of HKU9-, SARS-, MERS- and HKU4-RBD, were 
oriented similarly and presented as cartoons in parallel. The core-center, core-peripheral and the 
external subdomain are highlighted by encircling in yellow. For each structure, the topological 
arrangement of the core-center and core-peripheral strands as well as of the external components 
is depicted. The core-strands that flank the external subdomain are colored red and blue, 
respectively. Yellow lines indicate disulphide bonds. The N- and C-terminus are highlighted. (A) 
HKU9-RBD. (B) SARS-RBD. (C) MERS-RBD. (D) HKU4-RBD. The similarity in the 
topological arrangement of the external subdomain as an insertion between two core-strands was 


illustrated. 
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Figure 5. Homologous inter-subdomain amino acid interactions anchoring the external 
subdomain to the core subdomain. (A) Superimposition of the betaCoV RBD (HKU9-RBD in 
green, SARS-RBD in yellow, MERS-RBD in blue and HKU4-RBD in cyan) structures 
highlighting the external elements that can be well-aligned. These two elements, with 7 (element 
1) and 8 (element 2) amino acids, respectively, engage mainly the core subdomain H2 and H6 
helices for the inter-subdomain interactions. To facilitate comparison, the element residues were 
successively assigned a position marker (a-g for element | and a-h for element 2), which is 
highlighted. (B) Characterization of the element residues for their contributions to the inter- 
subdomain binding. The two external elements were presented as cartoons, while the core 
subdomain is shown in surface. At each position, the residue is marked sequentially with the 
position marker, the amino acid identity and numbering, the interacting mode/type and the side- 
chain orientation. For the interaction mode, the hydrophobic or van der Waals interactions are 
indicated with encircled Ps, the side-chain H-bonds with encircled Ss and main-chain H-bonds 
with encircled Ms. The side-chain orientations are indicated with arrows. (C) Summarization of 
the inter-subdomain interactions specified in (B). The element sequences of the four RBDs were 
aligned and listed. “+” indicates that a certain type of interaction is commonly observed at the 
position, while “+/-” indicates that the interaction type is specific to some but not all of the four 


RBDs. The arrows mark the side-chain orientations. 
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Table 1. Data collection and refinement statistics. 


Biochemistry 


Data collection 


HKU9-RBD 


(PDB code 5GYQ) 


Au derivative HKU9-RBD 


Space group 
Wavelength 

Unit cell dimensions 
a, b, c (A) 

a, By (°) 
Resolution’ (A) 
Observed reflections 
Completeness (%) 
Redundancy 
Reverge'(%) 

I/ol 

CCin 

Refinement 
Resolution (A) 
Number of reflections 
Completeness for range (%) 
Rwork!R tree” 

No. atoms 

Protein 

Water 

B-factors 


Protein 


P21 


1.03906 


42.7, 36.0, 62.9 


90.0, 102.7, 90.0 


50.00-2.10 (2.18-2.10) 


101,588 
97.1 (80.7) 

9.4 (9.4) 

6.1 (15.4) 
34.023 (12.057) 


0.998 (0.988) 


41.7-2.10 
10,811 
97.0 


0.1700/0.2006 


1367 


128 


28.7 


Pl 


1.03906 


36.0, 46.6, 57.3 
80.4, 88.8, 88.5 
50.00-2.48 (2.57-2.48) 
52,401 

97.7 (96.9) 

4.1 (3.7) 

9.2 (39.0) 

16.960 (4.637) 


0.986 (0.915) 
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Water 34.1 


R.m.s. deviations 


Bond length (A) 0.003 
Bond angles (°) 0.820 
Ramachandranplot* 

Favored (%) 94.64 
Allowed (%) 5.36 
Outliers (%) 0.00 


“Values for the outmost resolution shell are given in parentheses. 


Rinerge=XiZnk! | Lj -<I> | /2Xnuli, where Ij is the observed intensity and <I> is the average 


intensity from multiple measurements. 


© Rwork = || Fo|— | Fe| | /2 | Fol, where F, and F, are the structure-factor amplitudes from the 


data and the model, respectively. Rfree is the R factor for a subset (5%) of reflections that was 


selected prior to refinement calculations and was not included in the refinement. 


“ Ramachandran plots were generated by using the program MolProbity. 
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1 Table 2. Statistics of the core subdomain deviations among thus-far available betaCoV 
2  RBD structures. 
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HKU9-RBD MERS-RBD SARS-RBD HKU4-RBD 


11 HKU9-RBD — 2.07 A (104 1.92 A (100 1.37 A (94 Ca) 
12 Ca) Ca) 


14 MERS-RBD — 2.82 A (75 Ca) | 0.66 A (109 
15 Ca) 


17 SARS-RBD = 1.95 A (82 Ca) 


20 HKU4-RBD = 


24 5 The RBD core subdomain structures were superimposed onto each other by PyMol in a pairwise 
26 6 manner to calculate the root-mean-square deviation (rmsd) values, which were listed in the table. 
99 7 The values in parentheses indicate the number of equivalent Ca atoms that were selected for 


31 8 rmsd calculations. 
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Figure 1. Sequence features of HKU9-RBD 
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Figure 3. Crystal structure of HKU9-RBD 
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40 Figure 4. Structural and topological comparison of thus-far available betaCoV RBD structures 
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Figure 5. Homologous inter-subdomain amino acid interactions anchoring the external subdomain to the 
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